DOCUMENT RDSUNB 



AL 001 373 



ED 025 740 

By'Lyovin* Anatole 

A Chinese Dialect Dictionary on Computer: ProQress Report. 

California Univ.» Berkeley. Phonology Lab. 

Spons Agency 'National Science Foundation^ Washington^ D.C. 

Report No~POLA-2"7 
Pub Date Jun 68 

Note*45p.i Paper in Project on Linguistic Analysis* Reports. Second Series* Na 7. 

EDRS Price MF-S0.25 HC-$2.35 
Descriptors'Cantonese* ^Chinese* Computational Linguistics* Contrastive Linguistics* Diachronic Lin^’stics* 
Dialects* Dialect Studies* Dictionaries* Japanese* Korean* ^Mandarin Chinese* Phonology* ^Regional Dialects* 
Tone Languages 

Identifiers' Annamese* Guang Yun*.*Hanyu Fangyin Zihui* Ji Yun* Logographs 

The use of computers makes possible analysis of the vast amount of data 
available in recent dialect dictionaries and surveys and in the ancient Chinese rhyme 
books, such as "Guang yun" and "Ji yun." Comparison of dialects can enable a 
historical study of Chinese* a major language group outside the Indo-European area* 
to offer "a more balanced perspective on the nature of sound change in human 
language" The problems of coding are great* but once the coding system is 
established, the encoding of materials can be shared by a number of institutions. The 
coding of the seven Mandarin dialects in the "Hanjy fangyin zihui" is complete, and 
preliminary tests of the computer program have shown it to be satisfactory. After 
further testing and refinement* other dialect surveys, rhyme books, and Sino-korean* 
Sino-Japanese* and Sino-Annamese can be added to the system. The author gives 
details of the organization of the data, the coding system* the computer program* 
and errors in the source data. Appendices give the actual computer code, flow charts 
for the computer program* and a l»st of errors in the "Hanjy fangyin Zihui." 
Correspondence concerning POLA matters should be addressed to William $-Y. Wang* 
Department of Linguistics* University of California* Berkeley, California 94720. (MK) 



4irArA<e iTp/Toes _ 



phonology laboratory - department of linguistics 

UNIVERSITY OF CALIFORNIA, BERKELEY 

MTi 

'■‘OJ 



LU 






Reports. Second Series, No. 



June, I9S8 



Haruo Aoki . A note on glottalized consonants 
Kun Chang and Betty Shefts Chang. Vowel harmony 
in spoken Lhasa Tibetan 

Margaret Lauritsen. A phonetic study of the Danish 
st^d 



>^natole Lyovin. A Chinese dialect dictionary on 
^ computer: Progress report 

Anatoie Lyovin. Notes on the addition of final 
stops in Maru 



A1-A13 

1-81 

D1-D12 

C1-CA3 

L1-L22 



^ AL 0 01 3 73 

■ P’ 




























mmm 



t 

: 



I 

i 








Reproduction in whole or in part is permitted for any purpose of 
the United States Government. 



■ ' i 

i 

i 

The Project on Linguistic Analysis is supported in part by the National 
Science Foundation (Grant GSIU30), the Office of Naval Research (Contract | 

NOOOIU-67-A-OIIU-OOO5) , and the Air Force (Contract F30602-6T-C-03UT) . 

It is administered through the Phonology Laboratory of the University of | 

California at Berkeley, which has its office in 51 Dwinelle Hall j 

(Telephone: 8U5-6OOO, Extension 1507)* 

Correspondence concerning POLA matters should be addressed to | 

William S-Y. Wang, Department of Linguistics, University of California, j 

Berkeley, California 94720. 




Liu . u 



A Chinese Dialect Dictionary on Conpiter: Progress Report 



Anatole lyovin 

University of CaUfomia, Berkeley 



U.S. OtPAMMIMl OF HmiH, EOUCillOM & WEIFAK 
OFFICE OF EDUCillON 



SHSSS5S- 



itACillAU AD DflilCY. 



«The preparation of this paper vas supported by National Science Foundation 
Grant 0S1430. 

AL 001 373 













Table of Contents 



0. Introduction . . . i 

1. The organization of the data in the H&nyu fangyin zihul and 

our coding scheme ••••••••••••••••••••• 

2 . General notes on the coding format 

3. The computer program • • 

3.0. Purpose 

3 . 1 . Organization 

3 . 2 . Method 

4. Errors in the 

Appendix I. Computer code for the H&nyii fangyin zihul . • • « 
Appendix II. David Forthoffer. Flowcharts for the computer 

program 

Aitpendix III. Hsin-l Hsleh. A list of the errata In the H&urfi 

fangyin zihul 

References 

Footnotes . 



1 



3 

8 

8 

8 

9 

11 

15 

22 



31 



33 

42 

43 
















C-1 



0. Introduction.^ 

The project on computer Izing Chinese dialect data had Its inception 

in the sumoier of 1966 at the University of California, Berkeley. At 

2 

that time, Professor William S-Y. Wang, in his two worlqpapers, outlined 
the need for a more viable method of handling the vast amounts of data 
which are available in the Ancient Chinese rime books, for example. 

Gulling yim and Ji yim (the latter containing well over ^0,000 logograi^lc 
entries), and in the dialect dictionaries and surveys which have been 
recently compiled in China. ! These materials constitute a vast source of 
data which has to be analyzed in greater detail in order that our recon- 
struction of the earlier stacks of the Chinese language may be more 
accurate and complete. Furthermore, because until now the most imopressive 
achievements of comparative linguistics have been mainly in the field of 
Indo-European, a clearer picture of the hlstcnrlcal developments in another 
major language group such as Chinese (which 1^ structurally different 
from Indo-European) would be crucial for a more balanced perspective on 
the nature of sound change in human language. 

However, the very vastness of the available data on Chinese has 
until now been more of a hindrance than a help, since the difficulties 
in tabulating and analyzing this data are almost insurmountable for 
individual scholars, or even small groups of scholars. The solution to 
this problem lies in the use of digital computers, which can handle great 
amounts of data at great speed. Althou^ the initial outlay of time and 
effort involved is still by no means minimal even for comgputerizing a 
single rime or dialect dictionary, it is much less time-consuming than 
wnrmfti tabulation of information. Moreover, the task of coding different 

t 

source mater leQs can be distributed to several groups or institutions, 

ERIC 



r 






C-2 



5 



11 



^Ich in return for the money and effort invested will eventually be able 

I to share in the information ixrovided by the computer. In this way, we 

can begin integrating the materials from diverse sources into a compu- 
terized pool. of information which will yield factual cuiswers quickly and 
I accurately. 

i 

f During the summer of 1966, two concrete steps were taken to explore 



I 

i the problems Involved in the coding of the data. First, Professor Wang 

! devised a coding system for the material in the Hinyil fangyin zbiul^ 

( ■ 

I (hereafter referred to as the zhiui) which will be described below in 

I % " • . 

I detail. Second, Charles N. Id, a graduate student at the University of 

f ' 

[ California, Berkeley, was assigned to devise a computer code for Chinese 

logographs. 

In October, 1967> work began on the coding of the seven Mandarin 
dialects in the zlhul. At the time of this writing the coding of the 
above-mentlonea portion of the zihui has been completed; the keypunching 
of the IBM cards is still in progress, but a sufficient sample of the 
data cards is available to test the computer programs which have already 
been written. 

After the conqiuter programs have been sufficiently tested on the 
Mandarin data and found compatible with our purposes , the rest of the 
dialect data in the zhiui will be coded and added. In the future, data 
from other dialect surveys, rime books, Sino-Kbrean, Sino-Japanese, and 
Sino-^Annamese can also be added without any need for major revision of 
the computer programs which have been written. 









C-3 



1. The organization of the data in the Hanyu fangyin zthui and our 
coding scheioe. 

The zlhul contains, rou^ly, 2700 Chinese logographs with their 
Ancient Chinese classification according to the rime dictionaries Gu&ig 
yiin and jl yun, and their idionetic value in the following seventeen 
Chinese dialects: Pekinese, jt-n&i, Xi-an, Tai-yuan, Han-kou, Cheng-du, 

Ywg-zhou (Mandarin dialects); Su-zhou, Wen-zhou (Wu dialects); Chwg-sha, 
Shuang-feng (Xiang dialects); Nw-chang (a Gwi dialect); Mei-xian (a Hakka 
dialect); Gu&ig-dong (a Cantonese dialect); Xla-men, Chao-zhou (Southern 
Min dialects); and Fu-zhou (a Northern Min dialect). (The logographs in 
the zihui are eurranged according to the Peking pronunciation by final, 
initial, and tone.) 

This information in the zihui is arranged in, rou^ly, 2700 columns; 
each column contains nineteen cells. Cell 1 contains the Chinese logo- 
graph, and sometimes the following Information: i 

(1) If the same logograph appears twice in the zlhui (i.e. if it 

'i 

more than one pronunciation according to its meaning or according to i 

a particular disyllabic word in which it appears as an element), the | 

coog)llers also list the expression in which the said logograih has a | 

particular ihonetic value. For example, on page 117 we have ^ which is 

pronounced ^kei in the Peking Mandarin expressions^ ; on page 64, we 

c I 

encounter the same logograph with the phonetic value t(^^ which it has in 

the word^^ (in the same dialect). 

(2) Logographs which have the same phonetic value in all the dialects 
listed, as well as the same classification in the Ancient Chinese rime 
books (i.e. logographs completely homophonous with the main logograjh in 









C-4 



Cell l), are listed in a footnote. 

The Coding of Cell 1: 

k 

Although the code for the logographs vhich was devised by Charles Li 
was ingeniotis, it still failed to present a system whereby a Chinese 
logograph could easily be coded and decoded. Moreover , his code was not 
very economical, in that it required long strings of code characters 
which would substantially increase the time consumed in the keypunching 
of the data. The Chinese telegraphic code^ was, therefore, adopted for 
the coding of Cell 1. This latter code, althou'^h stil3. not very satis - 
factory in many respects, was found to be more useful for our purposes 
than Li*s code. In his final report (Li 1967), Li summarizes the problems 
involved: 

'The problems involved in devising a character coding system 
are of three types. The first type of problems has to do with the 
establishment of an isomorphic relation between the codes and the 
characters; the second type is concerned with decoding; the third 
type is concerned with simplicity. Thus, a perfect coding system 
will provide a unique code for each character, a sin 5 >le and easy to 
lesrn coding procedure, and a direct, sinple decoding procedure 
that requires no reference to a dictionary. One may choose to solve 
only one type of problems and Ignore the others. For example, the 
telegraphic code is a system which provides good solutions to prob- 
lems of the first and second type, but completely ignores the problems 
of the second type. My coding system aims at a per fact solution to 
all the problems, but falls short of its aim. The system has been 
ii^oved in its ability to provide a unique and a shorter code for 













C-5 

each character. But, the coding and, therefore, the decoding pro- 
cedieres^as veil, since they are merely the inverse of each other, 
are getting more complicated. It seems that the only perfect solu- 
tion to the coding problem is to employ mechanical means to perform 
both the coding and the decoding ... * 

In the case of the logographs which appear more than once in the 
zbiui (see p. 3), we have added supplementary alphabetic symbols to the 
telegraphic code to indicate that a particular logograph appears more than 
once. For example, the telegra.phic code for is 4822; its first occur- 
rence (on p. 64) was accordingly coded as 4822A, whereas its second occur- 
rence (on p. 117) was coded as 4822B. (it should be noted that the 
telegraphic code sometimes also contains an alphabetic code letter; for 
example, ^ is 475A. However, since the telegraphic code always con- 
tains four code characters, there will be no confusion between, say, 475A, 
and 0475A, 0475B: it is only the fifth code character that specifies a 
case of multiple occixrrence in the Zihui.) 

Finally, the homophonous logographs are coded separately within their 
own matrices or columns. (In other words, they are treated as separate 
entries.) We could have easily provided more parts for Cell 1, each part 
containing the telegraphic code for the homophonous logographs. However, 
we had to foresee the possibility that the logographs in question mi^t 
not turn oxcb to be perfectly homophonous to each other in the dialects 
covered by other dictionaries or surveys which will be coded in the future. 
Cell 2 and its coding: 

Cell 2 contains information on the jtoonological categories assigned 
to each particular logograi* by the Ancient Chinese rime books (Guang yiin 






C-6 

and Ji yiin). In a few cases where a logogra]^ does not appear In these 
rime books, Cell 2 Is blank and Is coded as zero. 

This cell contains the following parts: 

Part 1. l6 she ( ) or *rlnenies*. The l6 she are each coded by two 

letters: the first letter denotes either ndl zhudn (t^ )(N, 0, P) or 

wAl zhu^ ) (W> X); the second letter denotes the ending (0, I, U, 

M, N, G). (See Appendix I, Part I. a.) 

Part »2. kal-kSu vs . he-kou ( Ffl ^ ) . These are coded as 

KAI and HE, respectively. (See Appendix I, Part I.b.) 

Part 3. Four divisions or deng (^ ). These are coded as ID, 2D, 3D, 

4D. (See Appendix I, Part 

Part 4. Tone. ^ = 1, JL = 2, 

(See Appendix I, Part I.d.) 

Part 5. Subrlmes or ydn (^^ ). There are about 189 ydn. These are 
coded numerically according to the she to ^Ich they belong. (See Appendix 
I, Peurt I.e.) 

Part 6. Initial or nlfi (fefl- ) (40 categories). These are coded by an 
alphabetical code based on Dong Tong-he*s reconstruction of the Ancient 
Chinese Initials.^ (See Appendix I, Part I.f.) 



o 

ERIC 



Cells 3-19: 

These cells contain the Information (In the IPA transcription) on the 
phonetic value of each logograph In the following dialects: 

Cell 3 *= Peking (^*^/?' ) 

Cell 4 = Ji-nan ilp ) 

Cell 5 = Xl-an ) 

Cell 6 * Tal-yuan ) 



U 






















C-7 



Cell 7 “ Hdn-kSu ( */^L '^ ) 

Cell 8 = Cheng-du (/Sfc ) 

Cell 9 * Ywag-zhou ( 'H^ ) 

Cell 10 « Su-zhou ( H | ) 

Cell 11 » Wen-zhou ( ',§ 0 , 'HI ) 

Cell 12 - Ch^g-sha ( -|l ^ 

Cell 13 - Shuang-feng ( ) 

Cell l4 ^ Ifw-chang. { ^ ^ ) 

Cell 15 * Mei-xiin ( ) 

Cell l6 « GuSng-zhou *H| ) 

Cell 17 = Xia-nen ( ) 

Cell l8 = Ch^o-zhou ( ’HI ) 

Cell 19 = Pi-zhou ( ^1^ *)ll ) 

As stated above, we have so far coded only the seven Mandarin dialects, 
i.e. Cells 3 to 9 inclusive. 

In each cell there may be more than one pronunciation of the logo- 
grajh: sometimes there are several variant readtigs of a particular logo- 

graih in a single dialect. In many cases the difference among the variant 
pronunciations is that between the colloquial pronunciation vs. the read- 
ing or literary pronunciation. In the zihui, the contrast between the 
colloquial and reading pronunciation is indicated by a single underline and 

a double underline, respectively. 

In our code each syUable is broken down into four parts: 

Part 1. Tpne. (For the symbols used, see Appendix I, Part II. a.) 
Part 2. Initial. (For the symbols used, see Appendix I, Part Il.b.) 
Part 3. Vowel conq>lex. (For the symbols used, see Appendix I, Part 



II. c.) 






C-7a 



Part 4. Final consonant. (For the symbols used, see Appendix I, 

Part II. d.) 

Note: Zero initial and zero final are both coded as 0. 

The contrast between the colloquial and the reading pronunciation 
of a logograph is shown in the following manner: %Aienever there is a 
contrast between these two types of readings, the colloquial pronuncia- 
tion is coded within parentheses. For example, the logograph ^ in the 
Xld-men (Amoy) dialect has these two values: (a) literary klan and (b) 
colloquial klf In our code, these two syllables will be synibolized as 
follows (Q represents a single space): 

C.17 nP.iaP.SKP.aiAP.lffl □ (P.13P.2KP.3IZP.U0) 

If there are two or more colloquial variants, but no literary variants, 
the colloquial readings are not enclosed in parentheses, but are merely 
separated by one space. 

The coding of syllabic neisals presents something of a conceptual 
problem. For example, the logograph ^ has the phonetic value ^ (syllabic 
velar nasal) in the dialects of Su-zhou, Wen-zhou, Mel-xlan, and Guang-nhou. 
In this case, what is to be considered the initial, the vowel complex, and 
the final? For this syllable we code zero initial, zero final, and r) as - 
the vocalic element. As far as our code is concerned, this treatment 
accords with the general pattern: only nonvocalic segments can appear as 

initials or finals. However, if we are interested in finding the reflexes 
of the Ancient Chinese initial (usually reconstructed as ^), then 
the nature of the vocalic segment becomes relevant. Our program must, 
therefore*, instruct the computer not only to examine Part 2 of each cell 
(in which the initial is coded), but also the contents of Cell 3 In each 
dialect which has syllabic nasals in its Inventory. 









1 






0 

fl 

D 
















2. General notes on the coding format. 

In our code each column of the zhmi begins with an asterisk and ends 

in a slash. The asterisk is always punched in colunn 1 of an IBM card. 

A ceU is specified by C.X, where X stands for the nuntoer of the cell. 

For example, ceU one is coded C.l, ceU seven is C.7. The cell address, 
C.X, is always preceded and foUowed by at least one bOnnk space, except 
fcr C.l, which is immediately jareceded by an asterisk. Only the first 
seventy-two columns of an IBM card carry the codes; the last eight columns 
are reserved for the card identification number. A part (i.e. a division 
within each ceU) is coded P.X, where X stands for the number of the part. 
Thus, part one is coded P.1, part two is coded P.2, and so on. For 
example, the coding of the column for the logograph (p. 64) will look 
like this: ec.lD4822Anc.2DP.lNMP.2KAIP.33DP.44P.54P.6Kn 
C.3 DP*12P.2TCP.3IP.40 Dc.4 DP*1-> ©"fee. ••• DC.9 DP*l**P»2®CP.3IE3P*Q n 
(P.12P.2KP.3B3IP.^) D/ 

D represents a significant space in the code. As stated earlier, 
whenever our code specifies a space between symbols, we have left at least 
two spaces blank. This extra spacing allows us to make corrections on the 
IBM cards without shifting the data to preserve the spacing required by 
the co6.e format. The symbols which appear in Cells 2-19 listed in 

Appendix I. 

7 

3. The computer program 

3,0. Purpose. The Chinese Dictionary Program is designed to read a 
Chinese dictionary, Hdnyfi fangyin zihui, from cards and then print tables 
consisting of certain parts of certain entries. Parts of entries are 
printed only if the «hole entry fulfills specified requirements. At the 












C-9 



present tlne> specified parts must contain one of several possible choices. 
The program Is organized so that the matching and printing specifications 
nay be easily and clearly given. The program can also make a thoronc^ 
check of the cards to make sure they are punched ixroperly. 

3.1. Organization. The entire Chinese Dictionary Program Is actually 
a collection of small programs built around the main dictionary-search . 
program. This main dictionary program reads in the Hdnyii f^gyln zihuJt 
and prints specified parts of an entry If specified conditions are net. 

The subsidiary programs do such things as reading In the print specifica- 
tions and converting them into a form usable by the main dictionary 
routine » or reading in the match specifications euid converting them for 
use by the main dictionary routine. These separate programs are all tied 
together by a master program called the Executive. The Executive Program 
reads control cards and executes programs depending on uhat the control 
card says. The standard form of control cards has a '$' in the first 
column followed by a name. For exan^le: 

$DICT. This calls the main dictionary program. The bcuslc operation 
of this progreun has already been explained. The cards of the 
H&oyii fangyin zihul must Immediately follow. The program returns 
control to the Executive Program when a $-card Is read. 

$ERINT. This sets up the Print Table. All, or any number of, parts 
of a cell may be specified on a single specification card. The 
total number of parts Is limited only by how much room Is on the 
printer line (see $WIDTH). The following are some exanqples: 

C.l (print all of Cell l) 

C.2J>.5_SUBRHYME (print Cell 2, Part 3 ) 






C.12 P.2_P.3 ^INITIALJVNDJIUCI£US (print Parts 2 & 3 of Cell 12) 

C .17 (print all of Cell 17) 

The specifications must be in the same order as in the dic- 
tionary entry. Exactly one space must appear between each part of 
the specification. If two consecutive blanks are found, the rest 
of the card is treated as a comment. The specifications must start 
in the first column (i.e. a 'C must appear in col. l). A limited 
check is made to see if these rules are followed. 

The Print Table routine reads specification cards and builds 
up the Print Table until a card with a in the first column is 
found, when it returns to the Executive Program. 

$Trri£. This allows a title to be printed. The dictionary program 
doesn't print a title, but this option may be used. The paper is 
normally shifted to a new page; then a line is printed. This line 
is formed by taking together all ei^ty columns of the first card 
following and the first forty columns of the second card following. 
Thus, each $TITLE card may skip to a new page, then print a single 

line. 

If, however, the control card is of the form $TITLE SUPPRESS, 
then the next $TITLE card will not skip to a new page. This allows 
two or more lines to be printed at the beginning of a page. 

Each line is followed by a single blank line. After the two 
cards following the $TITLE card are taken care of, the Executive 
Program takes over. The next $TITLE card after a simple $TITIE 

card will be printed on a new page. 

$WH)TH n. The normal field width for the printout of the main dic- 
tionary program is ten chara,cters. This allows twelve parts to 






c-u 



be printed out. The $WIDTH n control card changes this field 
width to n characters (n is a single digit). Thus, if n is six, 
bwenty parts ccun be printed out. The value of n controls only 
the printing, not the conparisons used to determine whether some- 
thing will be printed. 

If n isn't a digit (e.g. a letter), a message will be typed 
and the field width set to ten. 

$PAUSE. This simply halts the execution until the START button is 
pushed. It gives the operator tine to change the input.' 

$TYTEWRICER INPUT. This causes all subsequent input to be from the 
typewriter. 

$CARD INPUT. This causes all subsequent input to be from the card 
reader . 

$SEQ,UENCE CHECK. This causes the cards that form the Hanyu fangyin 
zlhui to be sequence -checked. If two cards are out of order, a 
message is typed and the computer halts. 

^0 SEQUENCE CHECK. This stops sequence checking. 

$LIST CONTROL CARDS. This causes all subsequent control cards to be 
typed on the typewriter. 

$UNLIST CONTROL CARDS. This stops the listing of control cards. 

$END. This signals the end of the run. A message is typed and the 
IBM Monitor Program takes over. 

3.2. Method. The main dictionary-routine works basically on two 
tables. The Print Table tells what specific parts of each entry are to 
be printed in the case of a successful match. The Match Table tells what 
things should match. Each entry in the Match Table consists of a part 



o 

ERIC 







number (e.g. 025 steuids for Cell 2, Part 5) and a number of choices. If 
the specified part of the entry exactly matches any of the choices in the 
corresponding entry in the Match Table, then the dictionary entry remains 
as a ix>sslble match. If the specified part of the dictionary falls to 
match any of the choices, the dictionary entry is rejected, and a new 
entry is considered. 

The main dictionary-routine may also be changed so that it prints or 
types a message if the dictionary entry fails to match. This is particular- 
ly useful ^en the Match Table specifies all 3.egal possibilities, so that 
any incorrect cards may be found. Though the dictionary routine normally 
does a partldl check on the entries, it cannot catch all of the mistakes 
without this change. The change may be easily made by means of a control 
card. 

If a mistake is found, the typewriter types a message, e.g. ERROR 02, 
and stops. When the START button on the console is pushed, the typewriter 
will type out the offending card and halt again. If the START button is 
pressed again, the computer will ignore the error and continue as if it 
were correct. A table of possible errors, their meanings, and possible 
treatments follows: 









c-13 



Error Table 



Number 


Routine 


Explanation 


01 


DICT 


Neither a 'C or a 'P' ; 


precedes a 


02 


DICI 


A ' * doesn't follow a 


'C.nn' 


03 


DICT 


A doesn't follow a 


'C.nP* or a 'C.nnP' 


04 


DICT 


Sequencing error 




05 


PRINT 


First character isn't a 


or a 'C 


06 


PRINT 


A doesn't follow a 


'C 


07 


PRINT 


A ' ' doesn't follow a 


'C.nn' 


08 


PRINT 


A doesn't follow a 


'C.n P' or a 'C.nn P' 


09 


PRINT 


A. ' ' doesn't follow a 


'C.nn P.n' or a 'C.n P.n' 


10 


NATCH 


First character isn't a 


'$' or a 'C 


11 


MATCH 


A doesn't follow a 


'C 


12 


MATCH 


A . ' ' doesn't follow a 


C.nn' 


13 


MATCH 


A 'P' doesn't follow a 


'C.n* or a 'C.nn* 


l4 


NATCH 


A doesn't follow a 


'C.n P' or a 'C.nn P' 

; 


15 


MATCH 


A character that isn't 
choice fields 

1 


a ' ' or a ',' is between 


16 


MATCH 


The end of the card was 
choice 


reached while forming a 


17 


MATCH 


A choice field was too long 




C-l4 



Other possible errors: 



(1) C.1_THE LOGOGRAPH (incorrect order) 

C.4 

C.2 ANCIENT CHINESE CLASSIFICATION 

■■MB* MM mm 



( 2 ) C . 3_P. 3_P. 4_NUCIEUS_AN1) JSNDING 
{ 3 ) C . 2 J». 1J». 4JP.6 J». 5 

(4) C.10 

(5) C.2 P.IJ».5 ^PEKING J)IALECT 



(P.3 P.4 are treated as a conment) 



(incorrect order) 



(letter 0 Instead of digit O) 



letter I instead of digit l) 
too hl^ a pax^ number) 



Setting up the Match Table | 

I 

The Match Table tells the program what the parts of a dictionary | 

entry should be. If all parts of an entry match the specifications, the 
dictionary routine will print out the entry (as specified by the Print 
Table). There may be a unique specification for a particular part of a 
cell, or there may be several choices. If a part of a dictionary routine 
matches any of the choices, it remains a possible match, and will be 
printed \dien the end of the entry is reached. If a part of an entry 
fails to match any of the choices, the entry is skipped and a new entry 

is begun. 

The routine to set up the Match Table is called by a $MATCH card. 
























C-l*' 

This card must be followed by cards specifying what parts must match with 
which choices. Each card gives information about one part. For example: 

c.l /1572/, /3632/, /60U/,_y5006/ 

C.2_P.l /wo/. M / 

C.2_P.2 /KAl/ 





The specifications must be in the same order as in the dictionary 
entry. Exactly one space must occur between the C-part and the P-part of 
the specification. The specifications must start in the first column 
(i.e. a 'C must appear in col. l). 

When the Match Table routine reads a card with a in the first 
column, It retinrns to the Executive Program. 




n 





4. Errors in the zihuL 

As Sh£ Wen-tao pointed out in his review (1963), the zhml contains 
numerous errors of various types. In the course of our coding, we took 
into account all the errors noticed by Shi and have corrected those for 
which he gives the necessary correction. Unfortimately, Shi did not have 
the opportunity to weed out all the errors in the zlhui, but merely listed 
the types of errors which occur, giving several examples for each. There- 
fore, we had to be on the alert to catch the remaining errors and correct 
them to the best of our ability. A list of the errata In the Zihi:^ is 
given in Appendix III of this report. It includes the specific errors 
mentioned by Shi, as well as those discovered by our coders. 

Some of the errors were easily corrected. Others, althou^ we noted 
the error involved, were not corrected, since to do so would reqilre a 
consultation with native speakers of the pertinent dialects. In many 
cases, our errata list merely states that a certain form may be erroneously 








transcribed or that some element is missing, but we were unable to provide 
the correct form. In those cases where an element is omitted, we have 
coded the missing part as XX; thus, the computer will be able to retrieve 
all incomplete entries, and we shall be able to correct them in the 
future by consulting Chinese informants. We shall simply have to go 
throu# our errata list and attempt to correct other types of errors as 
the occasion arises. 

Most of the errors found in the Zihui can be classified under the 
following headings: 

(1/ Omission. 

Tone marks are omitted quite frequently. There is reason to suspect: 
that in nany cases a nasalization mark is missing, but there was no way 

of checking this supposition. 

( 2 ) Miswritten or wrong logogra]^s in Cell 2. 

There a?c*e numerous cases of this throu^out the Zihui. For example, 
on page 29, Cell 2, Part 6 for the logograph % has instead of 0 5 
on page 25, Cell 2, Part 6 for the logograph has instead of . 

(3) Mismatch between the rimeme ( Jjjij ) and the subrime )• 

For example, on page 38, Cell 2, Parts 1 and 5 of the logograph 
do not match, since ^ is a sUbrime ot di , not of ^ . 

(U) Mismatch between a dialect form and the phonetic inventory of the 
same dialect (as listed in the introductory chapter of the zihul). 

This type of mismatch often involves tones as well as the segmental 
elements. For exan^le, on page h, the logograph in the Tai-yuan 

dialect (cell 6) is listed as being in tone, whereas, according to 

the inventory of tones for this dialect, there is no distinction between 
and . On page 10, the logograph ^ is listed as having the 







IgUljll 






final -idp in Ch^o-zhou, but the inventory of finals does not list this 
final for Chio-zhou. According to Shi Wen-t^, in some instances the 
phonetic inventory is incomplete as it stands and the dialect forms 
recorded in the zlhui are actually correct. 

( 3 ) Haphazard handling of the doiiblets in Ancient Chinese. 

It appears that the editors of the zihui either ignored or did not 
know how to deal with the problem of doilblets, i.e. those logographs 
which had more than one i^onetic value in Ancient Chinese and were accord- 
ingly listed under different categories in the rime books. In most cases, 
the editors give in Cell 2 that category which most of the dialect forms 
fit, without noting that some of the dialects have a form which reflects 
another Ancient Chinese value of the logograph in question. For example, 
on page 1^9 > the logograph is listed as having Jl as its tone in 
Ancient Chinese and belonging to the subrime ^ . However, in many 

dialects this wcnrd is read as ^ : Han-kou, Wen-zhou, Chwg-sha, 

Nan-chang, GuSng-dong and Ch^o-zhou. If we consult the Guang yim we dis- 
cover that ^ is listed there under the subrime ^ ( Jl ^ as well 

as the subrime )• for the purposes of establishing 

sound correspondences it would be better to split such matrices into two 
or more matrices, each reflecting a different Ancient Chinese classifica- 
tion. 

This problem is further complicated for those logographs which have 
two different categorizations in the Ancient Chinese rime books: the 
editors of the zihui have split these into two separate matrices without 
regard to the behavior of these doublets in other Chinese dialects. This 
overenphasls on the written representation of Chinese morphemes as well 
as the overall orientation towards Peking ttondarln results in a skewed 







0 

0 

0 






0 






pictiire of the phonological correspondences among Individual dialects: in 

many dialects the phonological distinction between the doublets is erased, 
perhaps throu^ analogy; the statistically dominant phonological value of 
the logograph may replace the less -used phonological value of the same 

logograidi. 

For example, in Peking Mandarin the logograph has two pro- 
nunciations: ^t(^iau (f ^ expression 'to teach', and 

t^iau^ ) in the expression 4^*^ 'education'. This tonal 

distinction is duly reflected in the Guang yun, which classifies the first 
instance as belonging to the subrime ^ ^ ^ ^ ) and the second instance 

as belonging to the subrime ^ ) • However , a glance at the 

two matrices for this logograph in the zlhul (pages lUU and 1^5) shows 
that althou^ In the e 3 q>ression is reflected as a 

in every dialect listed, in the case of ^ in Mandarin 

dialects, the Wen-zhou dialect, and the Mei-xiAn dialect reflect the f ^ 
but none of the other dialects reflect the expected tone, nsjnely ^ ^ • 

Obviously, in the latter dialects the two values of this logograjii merged 

into one which is pronounced ^ 

From the point of view of comparative morphology , this and similar 

Idienomena are, to be sure, highly interesting. On the other hand, if we 
are primarily interested in tabulating phonological correspondences, the 
arrangement of the data in the zihui is very unsatisfactory, and, in the 
case described above, would cause the computer to tabulate the following 
tone correspondence which is obviously invalid: Ancient Chinese ^ : 

Mandarin dialects : Me'i-xitb dialect : Win-zhiu dialect 

: other dialects ■ 



C-19 



Of course, the number of logographs showing such faculty corresponden* 
ces %rlll probably be so small that it will be easy to spot these anomalies. 
For our purposes, however, it would have been much more desirable to 
leave blank those cells which show moxiAiological levelling rather than the 
regular phonological development. 

The problem of the doublets in the Zihui is so complex that it was 
not possible to solve it during the course of the coding. One way out of 
this dilemma is to leave the matrices as they are, and then by utilizing a 
special computer program, to extract and tabulate all the data on the 
doublets and the Irregular correspondences ^ich may involve doublets. 

Once such a tabidation is available, the necessary splitting of the matrices 
and the deletion of the irregular forms can be effected. 

(6) Faulty cognates. 

There are cases where one would suspect that a die^.ect form does not 
belong in a particular matrix since it shows very unusual sound correspon- 
dences. For example, on page Ul we have the logograph ^ which in most 
of the dialects has ts as initial (and according to Guang yim, has the 
initial^ in Ancient Chinese, usually reconstructed as -^s); one of 
the colloquial forms in the Amoy dialect, however, is ^kia (along with 
^tsi and ^dzi), which does not seem to belong in the same phonological 
category. Again, on page 70 the Amoy colloquial reading of the IbgograiAi 
is .given as tslt^, but no other dialect shows ts as an initial, euid 
the GuSng yi)b classification Indicates that the Ancient Chinese initial 
was a glottal stop. This Amoy form may be a reflex of some other Chinese 
word cognate to Tibetan chig ('one') or a loan word from some Tlbeto- 
Burman language. (Judging from several Amoy forms, I suspect that the 
finals -Ik and -it are sometimes confused with each other.) 



o 

ERIC 









C-20 






Nevertheless, It seemed best not to alter or discard such examples. 
They can be singled out for spec led study after the conqputer tabulates all 
apparent irregularities. 

In addition to the above-mentioned types of errors, there are numerous 
minor errors of omission and commission. The IPA symbols are often writ- 
ten curslvely; what is more confusing, the same IPA symbol varies radically 
in its graiAilc representation. Thus, 1 varies from c tol , 2 to 1 (see 
pages 99> 102, and 123 of the zihul). 

The general inadequacy of the zlhui, however, lies in Its basic 
orientation to the Chinese logograph. Some of the errors ^Ich are caused 
by this orientation were already discussed in connection with the problem 
of doublets; another type of error will Illustrate this point further. 

On page l66, the logograph iS. is listed as having the following 
Ancient Chinese classification: ^ 91 — ii ^ . However, ^ ^'Is 

a subrime of the ^ ^ > not of . Thus, there is a mismatch 

between the rlmeme ( ^ ) and the subrlme ). It turns out 

that GuSng yim lists two different logographs, and ^ ; the former 

is classified as liv the latter as 

currently used as an abbreviated form of ^ , since in Peking Itandarln 

the pronunciation of the two logographs is identical^ although in Ancient 
Chinese the two words ended in -n and -m, respectively. It appears that 
in the zlhul is the abbreviated form of ^ , since those dialects 
^ich have preserved the distinction between -n and -m show -m as the final. 

This, r^ses the question: are there many more mistakes which have 
been caused by the use of abbreviated logograph^? We must keep in mind 

that the abbreviations were usually devised on the basis of Mandarin pro- 

0 

nunciatlon, idxereas many dialects preserve more original distinctions than 












C-21 



Nandarln. Seeoiid> if mritten loflograghs were used to elicit shonetie data 
tram dialect speakers , how much care was taken to see that the infomaats 
did not merely give the pronunciation of the phonetic element of a logo- 
grar^ if the morpheme represented by a paarticular logograph was not cur- 
rently used in their dialect? In the above case, both E. and^ occur 
frequently as phonetic elements in logographs, and confusion is very likely 
to occur. 

On the whole, however, the bulk of the data in the Zihui appears to 
be valid in spite of the methodological faults involved in its compilation, 
and will undoubtedly yield much relevant information concerning phonological 
developments in Chinese dialects* 






C-22 



Appendix !• 



Computer code for the Huyu fwgyln zbiul. 



PART I (The code for Cell 2) 

(&) The l6 she (Cell 2, Part l); 





» NO 


-(B.- 


WO 




« 00 


•1 ■ 


WI 


it 


» NI 


^ ‘ 


WU 


:rk 

///v^ 


» NU 


A * 


WM 


•xjp 


NM 


J-» » 


WN 


A 


s NG 


5^ • 


XN 




s OG 


vt - 


WG 




= PG 


\ 


XG 


(b) 


Kal-kou vs. 


he-kou (Cell 2, Part 2): 





>£7 « KAI 

X? = he 

(c) The four deng (Cell 2, Part 3)s 
^ ^ . ID 

^ ^ = 2D 
^ 3D 

XJ7 4d 

(d) The four tones (Cell 2, Part U); 

f = 1 
2 
3 

X= 4 

(e) The yiin or subrimes (Cell 2, Part 5). (Arranged according to she): 



ERIC 


















mm 






o 

ERIC 





(2)^^ (00) 


* 1 


It- 1 


2 


% - 2 


3 


3 


l-x* 


X- •» 


" 5 


X- 5 


4’Jf- 6 


j&- 6 


i" ’ 




4- a 




j§.= 9 




(>*) (WI) 




II 

H 


'll. = 11 


CM 

n 

l»N 


^ “ 12 


/(\= 3 


#K- 13 


4 


1*^ 


A “ 5 


15 


1^1= 6 




f|t= 7 




■f = 8 


-W- 

18 


= 9 




1’lr s 10 





C-23 



(3)1^^ (WO) 



1 

■ 2 

« 3 






%TW 



1 



n 



...f 



n 



n 



I i 



D 



D 



mm 













o 

ERIC 



(7) 



^tlk (“I) 


(6) 


i. - 1 


^ “ 


1 




2 


= 


2 




t= 3 


%• 


3 




II 

9(P 


^ " 


4 




- 5 


<Ji 

II 


5 




^ * 6 




6 




7 


t = 


7 




^ a 8 


'1' = 


8 




?&' * 9 




9 




10 




10 




A’ n 


Hr 


11 








12 




^ (HU) 


(8) A 


(WM) 




1 


f = 1 


W » 


16 


/!•= 2 


-^= 2 


1 * 
jatt* 


17 


'fl' 3 


|/l = 3 


II 

\>A 


18 


;t= 


/^= *» 


ii= 


19 


i- 5 






20 


'f V ^ 


tiC.= 6 




21 


4fe/“ 7 


7 




22 


^*>11= 8 


1= 8 


®JL“ 


23 


II 

VO 


M(i= 9 




24 








25 








26 




.;^=12 




27 








28 




II 

fr 


-■^= 


29 




4^=15 




30 



(wu) 



C-24 



^ “ 31 

1^-32 



mmmiiiiimimm 













(9) >5L^ (»«) 


(10) )U 


in (™) 


>j^ = 1 


1 


tR3 = 15 


)1- 2 


^ * 2 


■ 18 


-;iO» 3 


3 


'iilt = 17 






» l8 




4.= 5 


.4= 19 


(11) (mg) 


4= 6 


20 


^= 1 


4= 7 


5Cj= 21 


i= 2 


. X= 8 


piL= 22 


il“ 3 


«- 9 


23 


A’ 


^.10 


^= 24 


A“ 3 


a 11 

v*^ 


lf\ 

CM 

n 


6 




#*= 26 


7 


a 13 


f-’ 27 


'k,= 8 


i-ii* 


y|= 28 


^= 9 






10 






n 






(12) i^^(x») 








>1= 8. 


= 15 


4&.= 2 


9 


4 = 18 


^fi,= 3 


^"10 


4^J= 17 


ISj- U 


1=*^ 


^ * l8 

/X 


2<|- 5 


^»12 


<i^- 19 






^,= 20 


li 


il 

E- 


21 



ERIC 













C-26 



(13) (“) 

^ - 1 
% - 2 
- 3 

/^ “ ** 
i = 5 

-k~ 7 

SB 8 



( 15 ) 




(PG) 

1 

2 

3 

4 

5 

6 




(14) (WG) 



yZ. - 






t 



1 

2 

3 

4 



(16) 

2 

0^* 3 

P§ ■ “ 

^ = 5 
6 

7 

4=8 

>|[ * 9 
-fei a 10 

« 11 
= 12 
« 13 

= l4 
= 15 
a 16 



(XG) 



i 




ifeaiiBi«tai^aiiifigdda«a^^ 







o 

ERIC 



(f ) The initials or zhaH (Cell 2 


, Part 6): 






i. Labials 








f f 






F 


■ p’ 






F* 


^ » B 






V 


- M 






1C 


ii. Dentals 








= 1 






TS 


- T* 




v| = 


TS* 


» D 




c- 


DZ 


“ H 




/C = 


S 






#f = 


Z 


ill. Falatais and retroflexes 








» TJ 


^ 3 TSJ 


HI- ■ 


TSH 




$; = TSJ* 


= 


TSH 


DJ 


= DZJ 


& 


DZH 


Q = NJ 


SJ 


it- - 


SH 




=^= ZJ 




ZH 


iv. Velars, etc. 








i= K 


h = • 






>1 “ 


0|L- X 




‘ 


inf » G 


[£ « GH 


i 




N6 


= GHJ 


, 





■ 0 (Zero) 



V. Lateral 

^ = h 



C-27 






0 



0 



D 



D 



D 



0 



Di 



;q 



] 



ittu 






C-28 



' Tart II (The code for Cells 3-19) 
(a) Tones (Part l): 



C^)f = 1 




2 




3. ^ 


a 4 




f|-t = 


2B 




3B ^ X 


a 4B 




- 




vjz 


a 4C 


(b) mitialB (Part 2): 










Zero initial: 


0 (zero) 










Stops and affjricates: 










Labial 


Alveolar 




Palatal 


Retroflex, etc. 


Postvelar 


p P 


t a T 






tg a TSR 


k a K 


p* = PH 


t* a TH 






tg' a TSRH 


k» a KH 


pf = PF 


ts a TS 




t(f a TC 


t£ a TSP 


kw a KW 


pf* = PFH 


ts * a TSH 




t^'a TCH 


t5‘ a TSPH 


kw* a kwh 


b » B 


d a D 








g = G 


E = BQ 


<£ * DQ 












dz a DZ 




d^ a DZR 






Note: Noninitial H always 


stands for aspiration. 




Fricatives : 












Labial 


Alveolar 




Palatal 


Retroflex, etc. 


Postvelar 


II 


s a S 




^ a C 


1- SR 


X a X 


p « W 


z a Z 




J a J 


1 - ZR 


Y a XV 


f a F 








S = SP 


h a H 


a FH 


a SH 




9^ a CH 


gh a SRH 


fi a HV 


V a V 


2^ a ZH 






5 “ zz 







C*29 



Liquids: 

I 

v-M 1-L J«R 

i" • IS 
1* - IH 

Nasals: 



m > M 


n - N 


It 

s^ 


I) - NO 


« MH 


n^ - HH 


- NJH 


rf^ - NGH 


(c) Vocalic nuclei (Fart 3): 






1 - I 


y - Y 


u - U 




I - 11 


Y - Y1 


U - U1 




1-12 


'J « Y2 


o » 0 




2-13 




V - 01 




e - £ 


0-03 


0 - 02 




E - El 




U4» U3 




e - S2 


oe - OE 






9 - E3 


-e- ® o4 






sr » E4 


9 * OU (Fdi.z|)ou) 






3 » E5 








X- A£ 




a - A1 




a - A 




» A2 




syllabic nasals and syllabic liquid: 






m - MM 
1 


n - NN 
1 


1] - NN6 
1 


1 = LL 
1 



Note: A nasalized vowel Is coded vowel pliis Z. Vowel length (:) Is coded 
vowel plus W. 



C-30 

(d) Endings (Part 4): 
m B M n ■ N 

p - P t -T 

Note: 

Zero ending Is coded explicitly as 0 (zero). 

Special synibols: 

XX In any cell or part indicates that an element is either missing or is 
Incorrectly specified in the zlhut, but that we were unable to supply the 
missing element or correct the errcnr. 



» N J 1) ■ NO 

c-KJ k-K 














Appendix II. 

Flowcharts for the eongputer program 
David Forthof far 




$ TrriE^ 


INIT.T 


$ PRINT 


INIT.P 


$ MATCH 


INIT.M 


$ DICT 


DICT 


$ TYPEWRITER INPUT 


TY.IN 


$ CARD INPUT 


CRD. IN 


$ SEQUENCE CHECK 


SEQCH 


$ N0 SEQUENCE CHECK 


NSEQCH 


$ LIST C^R^L CARDS 


LISTCC 


$ UNLIST C0NTR0L CARDS 


UNLIST 


$ WIDTH 


WIDTH 


$ PAUSE 


PAUSE 


$ END 


THEEND 



0 








1 



C-32 






SIMPLIFIED PLOW CHART FOR THE MASTER PROGRAM 

( — 7 ^ 

Read an entry 
and classify It 



( Is It in one of 
Class I, II, II 

,T 

BO 

, i 

Store on disk 
suscordlng to 
the class 



0 

For each class; 





NOTES This program is designed to classify any segment of data into 

categories with an unlimited nuzober of subcategories with match* 

ing items . 












page 

1 

2 

4 

5 

9 

10 
10 

u 

11 

11 

14 

15 



o 

ERIC 

hiaifiiifftaiTi-Taaa 



Appendix III 

A list pf the errata in the Hinyu fangyin zihui, 

Hsin-i Hsieh 



character cell 






comment 

-uek does not appear in the inventory of 
finals 

-uap does not appear in the inventory of 
finals. 



f 




^na should be ^na. 


#1 




-oik does not appear in the inventory of 
finals . 




^ 'H I 


lacks tone mark. 






J:. should be . Giving yiin gives 




•'■'1 


-idp does not appear in the inventory of 
finals. 


A 




-p does not appear in the inventory of 
finals . 


A 


'''1 


z- does not appear in the inventory of 
initials . 


A 




-isu does not appear in the inventory of 
finals. 


% 




lacks tone mark. 






-iek does not appear in the inventory of 
finals . 






lacks tone mark. 



ri 

U 



n 



ii 



n 



D 



n 



D 



0 






' 0 : 






19 



comment 



o 

ERIC 



page 

19 

19 

19 

24 



character 

i’l 

M 

X 



cell 



t? 






26 




2 


26 




2 


26 


% 




27 


iL 


2 


27 






28 


4 




29 




2 


29 




2 


29 


% 


2 


29 






31 






32 




-Hi 


33 







lacks tone mark, 
lacks tone mark, 
lacks tone mark. 

does not appear in the Inventory of 
finals . 

should be . 

^ should be X • 

- of does not appear In the inventory of 
finals. 

Vjdp should be ^ . 

lacks tone mark. 

-ie? does not appear in the inventory 
of finals. 

should be ; 



should be 






>g7 should be B 
^ should be B . 

-ue does not appear in the inventory of 
finals. 

cxua'’' should be c^ue and xue . 
exua is from the reading \ ^ xu® 
from the reading Guang yun. 

-ie does not appear in the inventory of 
finals . 

-iX does not appear in the inventory of 
finals. 

















page 

33 

34 

35 
38 
44 
44 
44 

46 

46 

49 

55 

56 



59 

59 

64 

64 



n! 







C-35 


character 


cell 


comment 'll 




5^ 4 } 


-l2t does not appear in the Inventory of i-) 

finals. 


H- 




-lep does not appear in the inventory of f), 

il 

finals . 


JL 




ct^'ie could be a mistake for . |] 




2 


^ should be iti . p 




M 


1 1 

lEicks tone mark. ^ * 


y. -y. 


1^4- 


lacks tone mark. p 


in. 


•H -1 


-Ip does not appear in the Inventory of 

T 

finals . 1 j 






lacks tone mEork. 


& 




"ts* should be "tg' . • U 


ik 




a more common pronunciation is §t . fl 




2 


ij' 

ip should be Jl . 




2 


and represent the same word, || 

'bank'. In Gu^g yim, has two pro- 

nxmciations, one with the voiced initial Q i 

, the other with the voiceless initial || | 

. Some dialects reflect the former | 

1 

pronunciation; others reflect the latter. {| { 






.p 

Probably this is why modern reflexes fi*om j 
Ancient ^ (and ) appear somewhat 0 1 
irregulco?. ri 1 




2 


. u 

^ should be i:. . 




2 


should be ^ . Q i 






should either be 'V* or c.ij^i . 

- 1 


i'\ 




-a does not appear in the inventory of ; ij j 

1 

finals. r J 



finals 






1 



page 


character 


ceU 


65 


0 




65 






67 




2 


74 




2 


77 






87 


iii 




89 




2 


97 




2 


101 






113 


dll > 




123 




2 


124 




2 


127 




2 


128 


i ‘9 




131 






134 






139 


ri • ' 


2 


142 


■n 




154 




37^ 


158 


% 




158 







ERIC 



C-36 

comnent 

-le does not appear in the Inventory of 
finals. 

-is does not appear in the inventory of 
finals. 

should be ^ . 
should be 5^ . 

-uo does not appear in the inventory of 
finals. 

ct§*Wo should be ^ . 

ihould be 
should be . 
lacks tone mark. 

-oe^? does not appear in the inventory 
of finals. 

^ should be W • 

'I iSf 

^ should be . 

^ should be ^ . 

po’ should be po^ . 
t'o^ should be t*0* . 

)|^ is usually pronounced ts*au^ . 

should be . 
lacks tone mark. 

-su does not appear in the inventory of 
finals. 

lacks tone mark. 

is also pronounced 



m 










n WI ^ TW! 






er|c 



n 



1 



C -37 






page 


character 


cell 


comment 


158 


% 




lacks tone mark* 


159 




2 


according to Guang yun, has both a 

sh&ng shing and a qu sheng reading. 


159 






lacks tone mark. 


159 


1 


A A 


lacks tone mark. 


160 






should be c't9''o«. 


l6o 






lacks tone mark. 


162 






lacks tone mark. 


162 






lacks tone mark. 


162 






lacks tone mark. 


162 




i.i 


lacks tone mark. 


162 


i'ji. 




lacks tone mark. 


162 




2 


should be 7^ . 


163 


# 




->ud‘^ does not appear in the Inventory oi 
finals. 


163 




2 


should be . 


164 






p*3e,’ should be p'ae^ (?) 


164 




2 


should be . 


164 


tfl 




-uam does not appear in the inventory of 


164 




2 


has both p&ig shShg and qu sheng rea 


166 


li. 


2 


based on the information given in Guang 



•■I) 






P 

ti 



yun, JS. is to be marked as > 

and is to be marked as 
It seems that in the zlhui ;^JL is intended 
to be the abbreviated form of 






D 



fl 



iittiiiliaBttiii 









IJII. 



page 


character 


ceU 


eonBient 


167 


if. 




-am^ does not appear in the inventory of 
finals. 


169 






‘ U should be 'C . 


169 


% 




-lam does not appear in the inventory of 
fineds. 


170 




2 


Guang yun also gives • The pro- 
nunciations of ixi * 

and 7 $^^ probably reflect this form. 


171 




2 


J:- should . However, Guang yim gives 
three different pronunciations, two Inii , 
and one in . 


171 


A. 




-lei) does not appear in the inventory of 
finals. 


172 




2 


^ sho\xld be ^ . 


172 




2 


Both Ji ydn and Guang yib suggest the 
initial 4. for the character ^ . 


176 


III 


2 


should be ^ . 


179 




2 


should be/^ . 


183 






lacks tone mark. 


183 






lacks tone mark. 


184 




2 


should be J:. . 


185 


jSL 




should be cV‘2I • 


185 






shoiad be cV‘^ • 


185 






should be • 


187 






h- does not appear in the inventory of 









Initials • 










C-39 



page 


character 


cell 


comnent 


189 




ilf; 




ela could be a mistake for c*^ • 


190 








cla could be a mistake for c‘^< 


190 








lacks tone msurk. 


193 




2 


should read; \U^ “~*i , 


194 






2 


lie shouM be . 


195 




2 


should read: iJ-4/^ t!? ^ • Guang 

yun gives ^ for this character. 


195 


t 




nasalization mark omitted (?). 


195 


t 




nasalization mark omitted (?)U 


196 


% 




nasalization mark omitted (?)• 


199 


1 




2 


^ should be . 


200 






2 


should bejvfe. 


202 




"t 


2 


Guiwg yun gives both ping sheng and shang 
shehg for this character. 


203 








-en does not appear in the inventory of 
finals . 


203 




PI 




-en does not appear in the inventory of 
finals . 


203 


tfl 






-an does not appear in the inventory of 
finals . 


205 


/ 




2 


should he . 


209 


ft 


2 


should be ^ . 


210 






2 


should be ^ . 


211 






2 


Should be 


212 






lacks tone mark. 



n 

- ! I 



r"i 



"!l 



I ] 



,0 



r ’ •». I 

' i 

i 



u 



IJ 



D 



isp.!rc»«<r-To«K:Dr:ra:s3rjan-'«jnwxK^*xrmt;t«rH3ar.'5^^ 




■iiiMiiiiii 






c-4o 



o 

ERIC 



page 


character 


cell 


comment 


212 


iC 




lacks tone mark. 


212 






lacks tone mark. 


212 


u 




-St) does not appear in the inventory of 
finals . 


212 


Ak. 


h-^ 


ts * - does not appear in the inventory of 
initials . 


213 




2 


should be ^ . 


213 




2 


should be IrX • 


214 




2 


should be . 


214 


% 


2 


§ 1 should be • 


214 






lacks tone mark. 


214 


51 




lacks tone mark. 


216 




2 


^ should be . 


217 




2 


^ should be ^ . 


226 


3^ 


■-jW\ 


-Te does not appear in the inventory of 
finals . 


227 


1^ 


2 


^ shoiad be . 


231 


It 




-IT does not appear in the inventory of 
finals . 


232 


4- 




lacks tone mark. 


232 






lacks tone mark. 


240 






-0 does not appear in the inventory of 
finals . 


240 


00 


■ — 


lacks tone mark. 


255 


2 


should be 






■mL 






C-4l 



page 


dharaeter 


eeU 


coiBBaat 


,i\ 

1 


255 




2 


^ ^ should be /("-S-, 


-o' 


256 


* 


2 


^ f ,=_ should be 




258 • 




2 


^ ^ ^ should be 


!l 


260 




2 


V % «— — s — 

Guang yun gives ping sheng and qu sheng 


n 








for the character . 


!1 


260 






i 3 i lacks tone mark. 


i 1 


262 


"A 


2 


should be . 


i 1 


265 






ts'an lacks tone mark. 


0 


265 


AA 




-ui) does not appear in the inventory of 










finals . 


1 ! 


265 




2 


should be . 


n 



(1 

.0 



0 

U 




References 



DSng, Tong-he. 195^. History of Chinese ]*onology. Taipei. China 
Culture Publishing Foundation. No. 26 of Xi^dai guomfn Jiben zhishl 
congshu. (in Chinese.) 

Li, Charles N. 1966. Workpaper 3: A coding system for Chinese characters. 

University of California, Berkeley. (Unpublished.) 

1^7. Report on coding and programming information in the dialect 

dictionary, HanyS fangyin zlhui. (Unpublished report.) 

Peking University. 1962. HMiyu fangyin zihul. [A lexicon of the Chinese 
dialects]. Peking, Wenzl Gaige Chubanshe. (In Chinese.) 

Shi, Wen-tao. 1963. Review of A lexicon of the Chinese dialects. 

Zhongguo yiSwen 123.176-82. (In Chinese.) 

Wang, William S-Y. 1966. Workpaper 1: Rime dictionaries. Itoiversity of 

California, Berkeley. (Unpublished. ) 

1966. Worlq)aper 2: Dialect dictionaries. University of Cali- 
fornia, Berkeley. (Unpublished.) j 

3 

1 

I 

5 













Footnotes 



C-43 



^ The prei>eration of this report as well as the project described 
therein was supported by National Science Foundation Grant GS1430. 

^ Workpaper 2 (Wang I 966 ) presented a system of coding the phonetic 
data in the Zhiui. The problem of coding the Chinese logographs as well 
as some Ancient Chinese categorizations was left to be solved by Charles 

N. LI. 

^.The wAny S fangyin zbiui (Peking University 19^2) is one of the 
most comprehensive Chinese dialect surveys available. 

** Charles Li (Li I 966 ) devised an alphanumeric code for Chinese 
logographs which analyzed the logographs into strokes, each specified by 
a code letter; the relative position of the stroke within the logograph 
was Indicated by reference to the horizontal and vertical axes. 

^ The Chinese Telegraphic Code can be found in, for example, the 
Modern Chinese -English technical and general dictionary (McsGraw-Hill, 1963), 

Vol. 1. 

^ The choice of Dong's reconstruction (Dong 1954) '/fas entirely arbi- 
trary; our coding scheme wiU In no way prejudice the results since it 
faithfully preserves the existing distinctions without necessarily specify- 
ing what constituted these distinctions. 

^ This section of the report, as well as Appendix II, was written by 
David Forthoffer, a programmer for the project. 



o 

ERIC 



